SimBoost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines
نویسندگان
چکیده
Computational prediction of the interaction between drugs and targets is a standing challenge in the field of drug discovery. A number of rather accurate predictions were reported for various binary drug-target benchmark datasets. However, a notable drawback of a binary representation of interaction data is that missing endpoints for non-interacting drug-target pairs are not differentiated from inactive cases, and that predicted levels of activity depend on pre-defined binarization thresholds. In this paper, we present a method called SimBoost that predicts continuous (non-binary) values of binding affinities of compounds and proteins and thus incorporates the whole interaction spectrum from true negative to true positive interactions. Additionally, we propose a version of the method called SimBoostQuant which computes a prediction interval in order to assess the confidence of the predicted affinity, thus defining the Applicability Domain metrics explicitly. We evaluate SimBoost and SimBoostQuant on two established drug-target interaction benchmark datasets and one new dataset that we propose to use as a benchmark for read-across cheminformatics applications. We demonstrate that our methods outperform the previously reported models across the studied datasets.
منابع مشابه
Comparison of Different 2D and 3D-QSAR Methods on Activity Prediction of Histamine H3 Receptor Antagonists
Histamine H3 receptor subtype has been the target of several recent drug development programs. Quantitative structure-activity relationship (QSAR) methods are used to predict the pharmaceutically relevant properties of drug candidates whenever it is applicable. The aim of this study was to compare the predictive powers of three different QSAR techniques, namely, multiple linear regression ...
متن کاملComparison of Different 2D and 3D-QSAR Methods on Activity Prediction of Histamine H3 Receptor Antagonists
Histamine H3 receptor subtype has been the target of several recent drug development programs. Quantitative structure-activity relationship (QSAR) methods are used to predict the pharmaceutically relevant properties of drug candidates whenever it is applicable. The aim of this study was to compare the predictive powers of three different QSAR techniques, namely, multiple linear regression ...
متن کاملForecasting sales using store, promotion, and competitor data
Sales forecasting is a common topic in business. Our task is predicting a famous drug company daily sales for 1,115 stores located across Germany for six weeks in advance. Store sales are influenced by many factors. Our project aims to create a robust prediction model. Based on Gradient Boosting and Random Forest, our model performs well in this sales forecasting competition with resulting in r...
متن کاملResistance mechanism of human immunodeficiency virus type-1 protease to inhibitors: A molecular dynamic approach
Human immunodeficiency virus type 1 (HIV-1) protease inhibitors comprise an important class of drugs used in HIV treatments. However, mutations of protease genes accelerated by low fidelity of reverse transcriptase yield drug resistant mutants of reduced affinities for the inhibitors. This problem is considered to be a serious barrier against HIV treatment for the foreseeable future. In this st...
متن کاملA comparison of random forests, boosting and support vector machines for genomic selection
BACKGROUND Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative pr...
متن کامل